NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

HARP 3.0: Generalizing I/O and API Support for Machine Learning in Digital Audio Workstations

Cwitkowitz, Frank; Benetatos, Christodoulos; Deng, Qixin; Yu, Huiran; Pruyne, Nathan; O’Reilly, Patrick; Garcia, Hugo Flores; Duan, Zhiyao; Pardo, Bryan (December 2025, NeurIPS 2025 Workshop on AI for Music)

Free, publicly-accessible full text available December 1, 2026
Text2FX: Harnessing CLAP Embeddings for Text-Guided Audio Effects

https://doi.org/10.1109/ICASSP49660.2025.10890334

Chu, Annie; O’Reilly, Patrick; Barnett, Julia; Pardo, Bryan (April 2025, IEEE)

This work introduces Text2FX, a method that leverages CLAP embeddings and differentiable digital signal processing to control audio effects, such as equalization and reverberation, using open-vocabulary natural language prompts (e.g., ``make this sound in-your-face and bold''). Text2FX operates without retraining any models, relying instead on single-instance optimization within the existing embedding space, thus enabling a flexible, scalable approach to open-vocabulary sound transformations through interpretable and disentangled FX manipulation. We show that CLAP encodes valuable information for controlling audio effects and propose two optimization approaches using CLAP to map text to audio effect parameters. While we demonstrate with CLAP, this approach is applicable to any shared text-audio embedding space. Similarly, while we demonstrate with equalization and reverberation, any differentiable audio effect may be controlled. We conduct a listener study with diverse text prompts and source audio to evaluate the quality and alignment of these methods with human perception. Demos and code are available at anniejchu.github.io/text2fx
more » « less
Free, publicly-accessible full text available April 6, 2026
Code Drift: Towards Idempotent Neural Audio Codecs

https://doi.org/10.1109/ICASSP49660.2025.10890096

O’Reilly, Patrick; Seetharaman, Prem; Su, Jiaqu; Jin, Zeyu; Pardo, Bryan (April 2025, IEEE)

Neural codecs have demonstrated strong performance in high-fidelity compression of audio signals at low bitrates. The token-based representations produced by these codecs have proven particularly useful for generative modeling. While much research has focused on improvements in compression ratio and perceptual transparency, recent works have largely overlooked another desirable codec property -- \textit{idempotence}, the stability of compressed outputs under multiple rounds of encoding. We find that state-of-the-art neural codecs exhibit varied degrees of idempotence, with some degrading audio outputs significantly after as few as three encodings. We investigate possible causes of low idempotence and devise a method for improving idempotence through fine-tuning a codec model. We then examine the effect of idempotence on a simple conditional generative modeling task, and find that increased idempotence can be achieved without negatively impacting downstream modeling performance -- potentially extending the usefulness of neural codecs for practical file compression and iterative generative modeling workflows.
more » « less
Free, publicly-accessible full text available April 6, 2026
HARP 2.0: EXPANDING HOSTED, ASYNCHRONOUS, REMOTE PROCESSING FOR DEEP LEARNING IN THE DAW

Benetatos, Christodoulos; Cwitkowitz, Frank; Pruyne, Nathan; Garcia, Hugo Flores; O’Reilly, Patrick; Duan, Zhiyao; Pardo, Bryan (November 2024, ISMIR 2024 Late Breaking and Demo)

Full Text Available

Search for: All records